We have chosen to implement locks using regular files and index nodes. By using a unique naming algorithm, we are able to instantly `know' the name of a lock without having to search for it or ask a manager for the data. This minimizes time consuming calls to the network or to disk.
In order to secure a unique name, we need to provide enough information to be able to identify each atom uniquely. This is presently accomplished by passing a string to the lock function which can be combined with other elements such as the host on which the lock was created and any other relevant information. For convenience we classify atoms with an operator/operand pair. For example, consider a lock request to edit a file. In this case the operator would be ``edit'' and the operand would be the name of the file. The names must be processed to expunge unfortunate characters which would lead to illegal file names.
CanonifyName(char *buffer) { for (sp = buffer; *sp != '\0'; sp++) { if (!isalnum(*sp)) { *sp = '_'; } } }
We use a function CanonifyName(name) which returns a string suitable for use as a filename. It suffices to swap illegal characters for an underscore, for instance.
In order to function properly, the lock-name must be different for each distinct atom, but must be constant over time so that multiple instantiations of the program will always find the same lock. The time information should therefore not be coded into the name of the lock; instead, one relies on the time stamps on the inodes to determine their age. For example, when editing the file /etc/motd on host dax, a lock named
lock.cfengine_conf.dax.editfile._etc_motd
would be created.
We create two kinds of lock within the GetLock() call: a lock
for active threads of execution which blocks multiple instantiations
of a process, and a permanent lock which records the last time at
which the resource was accessed. See figure . The
latter information can be encapsulated in a single inode without using
any disk blocks and provides the information necessary to restrict the
frequency of access. We call this an anti-spamming lock.
If a lock already exists for a specified atom, and that lock has not expired, the atom remains locked and access to the atom is denied. Lock expiry occurs when a certain predefined period of time has elapsed since the active lock was created. In this case, a garbage collection mechanism attempts to carefully eliminate the process attached to the hanging lock (if it still exists) and then remove the old lock, replacing it with a new one and permitting the killing process to take over the task. The third possibility is that no active lock exists for an atom, but that the time since its previous execution is too short. This information is gleaned from the permanent lock. In that case access to the atom is also denied. This feature gives us the `anti-spamming' functionality.
![]() |
A record of these lock transactions is kept for subsequent analysis if required. This indicates when and how locks were created and removed, thereby allowing problem cases, such as locks which always need to be removed forcibly, to be traced.
![]() |
To make the locking behaviour user-configurable we introduce two
parameters called ExpireAfter and IfElapsed, which have
values in minutes. See figure . ExpireAfter describes
the number of minutes after creation at which a lock should expire. It
is measured from the creation time-stamp on the active lock to the
current value of the system clock. IfElapsed describes the
number of minutes after which it becomes acceptable to execute the
same atom again. It is measured from the modification time stamp of
the anti-spamming lock to the current value of the system clock.
We choose to read the current time as a parameter to GetLock(), rather than reading it directly in the locking function, for the following reason. The most correct time to use here could be construed in one of two ways: it could be taken as being the time at which the program was started, or as the exact time at which the lock creation takes place. The difference between these times could differ by seconds, minutes or hours depending on the nature of the job being locked. By using the time at which the program was started for all locks throughout the program, one effectively treats a `pass' of the program as a cohesive entity: if one lock expires for a given value of ExpireAfter, they all expire. A certain ordering of atoms can be preserved. If, on the other hand, one always reads the present value of the system clock directly, the locking mechanism becomes sensitive to the length of time it has taken to execute the different parts of the program. Both policies might be desirable in different situations, so we do not see fit to impose any particular restriction on this.
When a lock has expired, we try to kill the owner process of the expired lock. The process ID of the expired process is read from the active lock. Then the signals CONT, INT, TERM and KILL are sent in that order. On some systems, INT is the only signal that will not hang the process permanently in case of a disk-wait situation, thus INT is sent first. Then the default terminate signal TERM is sent, and finally the non-ignorable signal KILL is sent. Sleep periods of several seconds separate these calls to give the kernel and program time to respond to the signals. The CONT signal is placed first in case the process has been suspended and wants to exit straight away. This should be harmless to non-suspended processes.